Identification of consensus patterns in unaligned DNA sequences known to be functionally related
نویسندگان
چکیده
We have developed a method for identifying consensus patterns in a set of unaligned DNA sequences known to bind a common protein or to have some other common biochemical function. The method is based on a matrix representation of binding site patterns. Each row of the matrix represents one of the four possible bases, each column represents one of the positions of the binding site and each element is determined by the frequency the indicated base occurs at the indicated position. The goal of the method is to find the most significant matrix--i.e. the one with the lowest probability of occurring by chance--out of all the matrices that can be formed from the set of related sequences. The reliability of the method improves with the number of sequences, while the time required increases only linearly with the number of sequences. To test this method, we analysed 11 DNA sequences containing promoters regulated by the Escherichia coli LexA protein. The matrices we found were consistent with the known consensus sequence, and could distinguish the generally accepted LexA binding sites from other DNA sequences.
منابع مشابه
Multi-alphabet consensus algorithm for identification of low specificity protein-DNA interactions.
A method for the identification and characterization of protein-DNA interactions is presented. We have developed an approach for finding unknown multiple patterns that occur imperfectly in a set of several sequences. The pattern may contain letters from the nucleotide alphabet (A, C, G and T) including ambiguous characters (A/C, A/G, A/T; A/C/G, etc.). This method reveals weak DNA signals on an...
متن کاملLearning Consensus Patterns in Unaligned DNA Sequences Using a Genetic Algorithm
We use a oating point GA to learn a classiication rule which discriminates a set of related, unaligned DNA sequences known to contain a biological signal from other sequences which do not contain the signal. The classiica-tion rule learned by the GA is in the form of a DNA speciicity matrix with a xed threshold. We translate the matrix into a consensus pattern by using the matrix to align the p...
متن کاملRecognition of multiple patterns in unaligned sets of sequences. Comparison of kernel clustering method with other methods
MOTIVATION Transcription factor binding sites often differ significantly in their primary sequence and can hardly be aligned. Often one set of sites can contain several subsets of sequences that follow not just one but several different patterns. There is a need for sensitive methods to reveal multiple patterns in unaligned sets of sequences. RESULTS We developed a novel method for analysis o...
متن کاملSubtle Signal Discoveries in Unaligned Molecular Sequences Using Self-Organizing Neural Networks
In this paper, we study the problem of subtle signal discoveries in unaligned DNA and protein sequences. Motifs, also known as approximate common substrings, are good examples of subtle signals in DNA and protein sequences. The problem of motif identification in DNA and protein sequences has been studied for many years in the literature. Major hurdles at this point include computational complex...
متن کاملMotif discoveries in unaligned molecular sequences using self-organizing neural networks
In this paper, we study the problem of motif discoveries in unaligned DNA and protein sequences. The problem of motif identification in DNA and protein sequences has been studied for many years in the literature. Major hurdles at this point include computational complexity and reliability of the search algorithms. We propose a self-organizing neural network structure for solving the problem of ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computer applications in the biosciences : CABIOS
دوره 6 2 شماره
صفحات -
تاریخ انتشار 1990